Add common cache and per-build cache by theuni · Pull Request #62 · devrandom/gitian-builder

theuni · 2014-07-26T17:02:11Z

This is merely a POC to start a discussion. I'm sure there's a nicer way of achieving the same thing.

Allow each builder to cache some files for re-use in the next build. This allows for poor-man's dependency chaining.

Additionally, add a common cache pool for all builds. This can be used for saving (for example) downloaded files to be shared between builds.

Needed for the Bitcoin build process overhaul. I'll link the PR for discussion once it's posted.

devrandom · 2014-07-30T23:22:58Z

Interesting!

What is the advantage of the common cache? It seems like it would be better for each cache item to be attributable to a specific package.

theuni · 2014-07-30T23:30:03Z

I'm sure there are other cases, but in this particular use, common cache is helpful for descriptors that fetch their own sources.

It's used in the Bitcoin overhaul because there's a buildsystem shared between gitian and the pull-tester. Since the buildsystem fetches and verifies its own sources anyway, there's no need to include them as gitian inputs. And since several descriptors share sources, it'd be senseless to fetch them for each one.

devrandom · 2014-07-30T23:43:27Z

A couple of comments:

Normally, sources should go in the inputs directory
I originally envisioned the actual build process as having no network access. I'm surprised that there's downloading of sources as part of any descriptors.

I'm puzzled by "Since the buildsystem fetches and verifies its own sources anyway, there's no need to include them as gitian inputs". What is the downside to having the pull-tester place the sources in the inputs directory?

theuni · 2014-07-31T19:28:19Z

1: I agree, but this work is a bit outside the box. I'll try to show below why I went this route.
2: Again, agreed. But the sources only download on the first run since they're cached after that.

Neither of those are desirable, but they were sacrifices I made in order to unify things.

Would you mind giving the description of bitcoin/bitcoin#4592 a quick read? I'd like to give a real example of how all of this ties together. There's a lot going on, so I'll try to summarize as briefly as possible (hint: it won't actually be brief ;)

In the past, for Bitcoin, there's been a disconnect between what devs run, what the pull-tester tests, and what Gitian builds. I've attempted to unify those things so that the pull-tester is able to bulid/test exactly what Gitian will produce, minus the deterministic guarantees.

To do this, i created a build-system for them to share. This build-system builds all dependencies as-needed and caches individual results. So if libfoo's build-recipe (or the build-system itself) hasn't changed since the last run, it won't be rebuilt. Instead, it will just be unpacked. This system is deterministic in its own right... the details are a bit complex, but you can assume that to be true.

With that done, the pull-tester and Gitian can store the build-results and reuse them, rather than rebuilding each dependency every time. See here for how this is actually happening:

pull-tester: https://github.com/coryfields/bitcoin/blob/master/.travis.yml
gitian: https://github.com/coryfields/bitcoin/blob/master/contrib/gitian-descriptors/gitian-linux.yml

Note how both of them call "make -C depends", then use those results to build bitcoin.

The result is that our gitian descriptors can stay static... we don't have to sync them up with anything, and yet we know that they'll build the same thing that the pull-tester did for any particular commit. If a dependency needs to be changed, it's changed in the dependency builder.

So, all that said, here's an example of it in action:
https://github.com/coryfields/bitcoin/pull/3/files
If you check out the build-log, you'll see what's happening: https://travis-ci.org/coryfields/bitcoin/builds/31358210

Notice that the new version of qrencode was built/fetched/installed. Since nothing else depends on qrencode, nothing else had to be rebuilt. If (for example) qt had depended on qrencode, it would've been rebuilt as well against the new qrencode.

If I use gitian to build that commit, the exact same thing will happen. Any cached results from previous builds will be used so that only qrencode will have to rebuild.

The end-result is a guarantee that gitian will build exactly what the c-i is building, with no (or very little, i hope) chance of deviation. So this commit is all it takes for us to bump that dependency, have it built/verified, and have it present in a release. In less than 10 minutes.

So... I would very much like to maintain that behavior. Imo, it's a huge feature for us. However, the 2-cache system is admittedly very kludgy. Do you have any suggestions on how it could be done more elegantly?

theuni · 2014-07-31T19:33:01Z

To clarify, in case I didn't above, the pull-tester and Gitian know nothing of each-other. The pull-tester runs automatically via some cloud magic, and devs use Gitian manually. I realize that it may read as though the pull-tester is using (or aware of) Gitian in some way, but that's not the case. The common factor is the dependency builder.

theuni · 2014-08-06T21:09:23Z

@devrandom Any thoughts on the above? The bitcoin dependency builder is nearly merge-ready, and I'd like to have a plan for dealing with Gitian.

devrandom · 2014-08-06T22:17:25Z

I think a build artifact cache is likely to be a good direction. However, I'd like to make sure it's clear how to use it. Would it be possible to articulate exactly how each type of cache is meant to be used?

theuni · 2014-08-06T22:50:53Z

Sure. I'll describe exactly how I've used it for bitcoin, though I'm sure there are other use-cases.

Before the cache:

Descriptor 1: Windows

Input: libfoo-source.tar.gz
Output: libfoo

Descriptor 2: Windows

Input: git repo bar
Input: Output of Descriptor 1
Output: program bar.

Descriptor 3: Linux

Input: libfoo-source.tar.gz
Output: libfoo

Descriptor 4: Linux

Input: git repo bar
Input: Output of Descriptor 3
Output: program bar.

Process: User builds descriptors 1 and 2, saves the outputs, copies them to inputs, then builds descriptors 3 and 4.

With the cache:

Descriptor 1: Windows

Cache: Check global cache for libfoo-source.tar.gz. If it doesn’t exist, fetch it.
Cache: Check per-build cache for libfoo. If it doesn’t exist, build it.
Output: program bar. Store libfoo to per-build cache and libfoo-source.tar.gz to global cache.
Input: None

Descriptor 2: Linux

Cache: Check global cache for libfoo-source.tar.gz. If it doesn’t exist, fetch it.
Cache: Check per-build cache for libfoo. If it doesn’t exist, build it.
Output: program bar. Store libfoo to per-build cache and libfoo-source.tar.gz to global cache.
Input: None

Process: User builds descriptors 1 and 2. If cached versions of libfoo are found from previous gitian runs, they will be used instead of rebuilding.

Note that the logic to determine if a cached version can be reused is not handled here, that's up to the user to work out.

libfoo-source.tar.gz is only fetched once because when the 2nd descriptor is run, the 1st descriptor will have already put it in the global cache.

devrandom · 2014-08-06T23:11:56Z

Okay, putting a concise version of this in the doc directory would be helpful. I think we can recommend that sources go in the common cache, and binary build artifacts go in the build-specific cache.

I just realized that the gitian build process can be run without a network connection if the cached sources are present.

So I can go ahead and accept the pull request if you add the docs unless you have further thoughts.

theuni · 2014-08-06T23:16:17Z

Yea, the cache can be pre-seeded to mimic the use of inputs. I was tempted to use the inputs dir itself for the global cache, but I think that might lead to some nasty accidents.

I'll do up some docs. Thanks for hearing me out!

theuni · 2014-08-07T17:00:49Z

@devrandom added a quick readme.

Allow each builder to cache some files for re-use in the next build. This allows for poor-man's dependency chaining. Additionally, add a common cache pool for all builds. This can be used for saving (for example) downloaded files to be shared between builds.

Add common cache and per-build cache

theuni mentioned this pull request Jul 26, 2014

Add dependencies builder for pull-tester, gitian, easy cross-dev bitcoin/bitcoin#4592

Merged

theuni added 2 commits August 7, 2014 13:01

doc: add quick cache readme

b57286a

devrandom pushed a commit that referenced this pull request Aug 7, 2014

Merge pull request #62 from theuni/cache

9092f98

Add common cache and per-build cache

devrandom merged commit 9092f98 into devrandom:master Aug 7, 2014

rnicoll mentioned this pull request Feb 15, 2015

[Auto] Add dependencies builder for pull-tester, gitian, easy cross-dev dogecoin/dogecoin#936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add common cache and per-build cache#62

Add common cache and per-build cache#62
devrandom merged 2 commits intodevrandom:masterfrom
theuni:cache

theuni commented Jul 26, 2014

Uh oh!

devrandom commented Jul 30, 2014

Uh oh!

theuni commented Jul 30, 2014

Uh oh!

devrandom commented Jul 30, 2014

Uh oh!

theuni commented Jul 31, 2014

Uh oh!

theuni commented Jul 31, 2014

Uh oh!

theuni commented Aug 6, 2014

Uh oh!

devrandom commented Aug 6, 2014

Uh oh!

theuni commented Aug 6, 2014

Uh oh!

devrandom commented Aug 6, 2014

Uh oh!

theuni commented Aug 6, 2014

Uh oh!

theuni commented Aug 7, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

theuni commented Jul 26, 2014

Uh oh!

devrandom commented Jul 30, 2014

Uh oh!

theuni commented Jul 30, 2014

Uh oh!

devrandom commented Jul 30, 2014

Uh oh!

theuni commented Jul 31, 2014

Uh oh!

theuni commented Jul 31, 2014

Uh oh!

theuni commented Aug 6, 2014

Uh oh!

devrandom commented Aug 6, 2014

Uh oh!

theuni commented Aug 6, 2014

Before the cache:

Descriptor 1: Windows

Descriptor 2: Windows

Descriptor 3: Linux

Descriptor 4: Linux

With the cache:

Descriptor 1: Windows

Descriptor 2: Linux

Uh oh!

devrandom commented Aug 6, 2014

Uh oh!

theuni commented Aug 6, 2014

Uh oh!

theuni commented Aug 7, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants